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ABSTRACT 

This technical paper describes a software package that was designed to produce initial conditions for 
large cosmological simulations in the context of the Horizon collaboration. These tools generalize E. 
Bertschinger's Graf icl software to distributed parallel architectures and offer a flexible alternative to 
the Graf ic2 software for "zoom" initial conditions, at the price of large cumulated cpu and memory 
usage. The codes have been validated up to resolutions of 4096^ and were used to generate the initial 
conditions of large hydrodynamical and dark matter simulations. They also provide means to generate 
constrained realisations for the purpose of generating initial conditions compatible with, e.g. the local 
group, or the SDSS catalog. 
Subject headings: Cosmology: numerical methods 



L INTRODUCTION 

Numerical simulations have proved to be valuable 
tools in the field of cosmology and galaxy formation. 
They provide a mean to test theoretical assumptions, 
to predict the properties of large scale structures (and 
galaxies within) and give access to synthetic observa- 
tions without sacrifying the whole complexity that arise 
from non-linearities. Thanks to the recent progresses in 
terms of numerical techniques and available hardware, 
numerical cosmology has become one of the most im- 
portant (and CPU consumming) field among the scien- 
tific topics that require extreme computing. Over the 
last few years, a series of large simul ations ha ve been 
produced by, among others, Gen and pstrikerl ([2000), 
the Vir^o Consortium (Frenk et al. 2000, the Millenium: 
ISpringel et al. 2005). Weinberg et al. (2002 ). the Gaso- 
line team (|Wadslev et al.l l2004l ). Following the same 
route, the purpose of the Horizon Project^ is to fed- 
erate numerical simulations activities within the french 
comunity on topics such as : the large scale structure 
formation in a cosmological framework, the formation 
of galaxies and the prediction of its observational signa- 
tures. The collaboration studies the influence on the pre- 
dictions of the resolution, the numerical codes, the self- 
consistent treatment of the baryons and of the physics 
included. 

These investigations are performed on initial con- 
ditions (ICs thereafter) that share the same phases 
and their production is described in the current pa- 
per. The large Horizon ICs involves two boxes of 
50 and 2000 Mpc/h comoving size with respectively 
1024^ and 4096^ initial resolution elements (particles 
or grid points), following a ACDM concordance cos- 
mogony. They were generated from an existing set 
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of initial c onditions created for the 'Mare Nostrum' 
simulation (|Gottlober and YepesI 120071 ): they share the 
same phases but with different box sizes and resolu- 
tions. The (50/i~^Mpc, 1024^) ICs were used as inputs 
to the AMR code RAMSES (jTevssierl [20Q2h in a sim- 
ulation that included dark matter dynamics, hydrody- 
namics, star formation, metal enrichment of the gas and 
feedback. This simulation directly compares to the Mare- 
Nostrum simulation in terms of cosmology and physics 
and it will be refe red as the HoRlzON-M areNostrum 
simulation hereafter (|Ocvirk. Pichon and Tev ssier 20Q8h. 
The (2000/i"^Mpc,4096^) ICs serve d as a starting point 
for the H0RlZ0N-4n simul ation (Tevssier et al.l I20Q7L 
|http : //www . pro j et-horizon . f r) ) : it is a pure dark 
matter simulation and assumes a cosmology constrained 
by WMAP3. It is currently used to investigate the full- 
sky gravitational lensing signal that could be observed 
by the DUNE experiment (hence the 411). 

The paper is organised as follows: first, we briefly ex- 
plain the principle of the ICs' generation. Then we de- 
scribe how the phases were extracted from the MareNos- 
trum ICs in order to make the Horizon ICs consistent 
with this reference. We describe next the features of a 
series of codes used to generate and process the different 
Horizon ICs: 

• mpgraf ic: ICs generation with optional low- 
frequency constraints 

• constrf ield: Low-frequency ICs generation with 
point-like constraints. 

• degraf : Low-pass filtering and resampling of ICs 

• powergrid: ICs empirical power spectrum estima- 
tion. 

• split grafic: Estimation of matter density on a 
grid from a set of particle positions, and Peano- 
Hilbert domain decomposition. 

Finally we illustrate how these codes were implemented 
on the two HORIZON simulations. 

2. RANDOM FIELD FOR COSMOLOGICAL 
INITIAL CONDITIONS 



2.1. Grid-based initial conditions 

For completeness, we quickly review the principle of 
ICs generation. Most of t he fo ll owing has been strongly 
inspir ed from articles by iPenI (|l997l) and .Bertschinger. 
(|2QQlf ). Let us consider the initial 3D gaussian random 
field (^(x), representing the density or the displacements, 
and let us define its Fourier transform (^(k). If we con- 
sider zero- mean fields, they are completely defined by 
their correlation function or, equivalently, by their power 
spectra, P{k)\ 

P(fc)Jz5(k-k') = (<5(k)5*(k')). (1) 

All the statistical information in a gaussian homogeneous 
and isotropic realization is contained in this quantity and 
the difficulty of generating initial conditions resides in ob- 
taining a field which has the correct power spectrum. 
We chose to follow the convolution-based method de- 
scribed by e.g lPenI (|l997l ) and define the correlation ker- 
nel in Fourier space as: 

A{k) = ^/m. (2) 

To reproduce the correlation function accurately one may 
need to first convolve the power spectrum with the win- 
dow that describes the simulation box, as advocated by 
|Pen (199 7). The infiuence of the box size on rms density 
in spheres of given radius (which are relevant for mass 
function estimates of collapsed objects), is negligible for 
sphere radii much smaller than the box length, even for 
simulations designed to study galaxy cluster scales. 

Then the ICs generation is a two-step procedure. First 
a normal, uncor related random field of unit variance is 
generated in position space^. This white noise ni(x) has 
a constant power spectrum, e.g.: 

(|ni(k)|2) = l. (3) 

The second step involves convolving the white noise with 
the correlation kernel in order to obtain the initial fiuc- 
tuation field: 

,5(x)=ni(x)*A(x), (4) 

or in Fourier space 

,5(k)=ni(k)^(k). (5) 

It can be easily seen that (|(^(k)^|) = P{k) and the initial 
field automatically has the correct power spectrum. 

One of the main virtue of the method resides in the 
possibility of using the same white noise for different 
power spectra. In other words, it decouples explicitely 
the phases (which contain the specificities of a given re- 
alization in terms of relative positions) from the ampli- 
tudes of the fiuctuations (corresponding to one's favorite 
choice of cosmological model). A change in the physics 
or in the box size results in a change of the convolu- 
tion kernel, but the underlying structure of the field will 
remain globally the same for a given white noise realiza- 
tion. Conversely, Eq. [5] implies that the initial phases 
can be recovered from a set of ICs, provided that the 
convolution operation can be inverted. In other words, 
it is possible to generate a new set of ICs from an old 
one (see Section H]) and such a set would share the same 
overall structures with e.g. a modified cosmology or box 



^ We could have directly generated the real and imaginary part 
of each (5(k) following a A/'(0, l/v^) law, saving the cost of an 
extra Fourier transform, but we chose to remain compatible with 
the Graf ic code. 



2.2. Grid-based versus particle-based initial conditions 

In numerical simulations, the dark matter distribution 
is almost exclusively described in terms of particles and 
this discretized description is also applied to the gas in 
SPH-like hydrodynamical codes. Consequently, dealing 
with 'particle- type' data is the most frequent case while 
the current procedure naturally deals with densities and 
velocities sampled on a grid. This can be easily tackled 
by recalling the density- velocity relation that is valid in 
the linear regime : 



1 



VxU = -/(^^,^a)(5(x). 



(6) 



Here u stands for the comoving peculiar velocity, x for 
the comoving position, H for the Hubble constant, a for 
the scale factor and / is defined as the logarithmic time 
derivative of the growth factor: 



f{Qrn,^A) 



dlogL)+ 
dloga 



(7) 



Functional fits for / can be found in the literature 
(e.g. Lahav et al. 1991) or be directly computed for 
a given cosmology. Hence, assuming that particles were 
displaced from a regular grid and knowing their veloc- 
ities, the initial density field can be directly recovered 
from Eq. (|6]). One can see that the (eulerian) positions 
of particles are not directly involved in this procedure; 
however, their lagrangian coordinates are used to remap 
the particle velocities to grid cells. 

3. GRID-BASED ICS: TECHNICAL 
IMPLEMENTATION 

Once the "phases" (the white noise) are chosen on a 
grid of a given size, it is possible to use them to generate 
initial density and velocity realizations with the desired 
cosmology and power spectrum, at resolutions that can 
differ from the initial white noise realization. 

According to linear theory, which applies for initial 
conditions ^, density and velocity divergence are related 
through Eq. (|6]), so that a single white noise realization 
determines b oth density and vel ocities on a grid of equal 
size (see e.g. 'Berts chingerll2QQ ih . 

A first numerical implementation of this algorithm was 
made by E. Bertschinger in the package Graf icl. We 
extend here his code using the Message Passing Inter- 
face (MPI) library to deal with large simulation cubes 
on distributed memory platforms. Another implemen- 
tation of MPI-based Initial Conditions generat or in the 
context of the GRACOS code is described in IShirokovl 
(|2QQ7[ ). http : //www . graces . org, 

We also develop a few tools (low-pass filtering and resam- 
pling, power spectrum estimation, estimation of matter 
density on a grid from a set of particle positions) that 
work as well in parallel environments. We describe these 
tools and their usage in the following subsections. 

3.1. Mpgrafic: a parallel version of Graf id 

Generating initial conditions for cosmological simula- 
tions on a grid from an initial white noise realization 

^ actually, the validity of the linear theory is enforced by chosing 
the starting redshift in such a way that the resulting variance of 
the discrete density field is significantly less than unity 
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Fig. 1. — The left image shows a shce of a density field realization of size 256"^. The middle and right images show respectively slices 
of the cubes of size 128'^ and 64"^ obtained from the first density field by lowpass filtering and subsampling. The images have been rescaled 
to the same size to ease their comparison. The initial density field was obtained using mpgrafic with the parameters of Table [2] The 
low-frequency, resampled fields were obtained with the degraf utility. 



is theoretically quite simple, as it involves a straightfor- 
ward implementation of Equations ([5]) and (|6]) in Fourier 
space. Indeed, the main issues in the Graf icl code in- 
volve getting the normalization right (in terms of a^ or 
Qrms, which are input parameters), and therefore in the 
cosmology routines. 

3.1.1. Description 

Difficulties of a more technical nature appear as the 
size of the desired cubes (and/or the number of particles) 
grow and that a single cube does not fit into a computer's 
memory. An elegant answer to this problem, in the con- 
text of mul ti-resolution ( ^ |zoom " ) initial conditions was 
designed by iBertschingerl (|2QQll ) and implemented in the 
Graf ic2 package. This solution involves generating a low 
resolution cube first, and successively adding higher fre- 
quencies in nested sub-regions, constrained by the phases 
of the already existing low frequency modes. Strictly 
speaking, the exact solution to this problem is (naively) 
as costly as the direct generation of the full cube at the 
highest resolution, but approximate, less costly solutions 
based on anti-aliasing kernels can be designed. This is 
precisely what is done in the Graf ic2 package. 

The main advantage of this solution is to produce 
multi-resolution initial conditions with a very reasonable 
memory usage, but it also has drawbacks, namely its 
complexity, and its built-in restrictions on the nested 
cubes structure of a given maximum size. Given the 
growing size of computer clusters, our "brute force" ap- 
proach to the problem based on parallelism becomes pos- 
sible, and it is also in some ways more flexible. First of 
all, it allows for a direct generation of global initial con- 
ditions for very large cosmological simulations. Secondly, 
it also allows, together with associated tools for low-pass 
filtering and resampling, to create multi-resolution simu- 
lations of a more general structure by simply extracting 
the desired sub-regions from the series of "downgraded" 
cubes (obtained by low-pass filtering and resampling of 
the initial large high-resolution cube). 

There are two issues that arise when implementating 
a parallel (MPI-based) version of the Graf icl software. 
The first involves performing efficient three-dimensional 
fast fourier transforms (FFTs thereafter); this difficulty 
is solved by using the parallel version of the FFTW ^ 

^ http://www.fftw.org 



library, which uses a slab domain decomposition. The 
second difficulty lies in the input /output. Indeed, we de- 
cided to keep the binary (Fortran) structure of the files 
in the Graf icl format. In a parallel environment where 
each MPI process is responsible for one chunk of data, 
this lead us to write part of the I/O subroutines in C us- 
ing reentrant read/ write routines, wrapped in fortran90 
to be callable from the main program. 

Finally, apart from the parallelism of mpgrafic, we 
have added a few new features to the original Graf icl 
code. An implementation of the matter power spectrum 
with baryon oscilla tions was introduced, as described in 
lEisenstein and Hul (J,998,). Secondly, the possibility of 
constraining the low frequency phases of the density and 
velocity realizations was implemented, with the input of 
a given white noise cube of lower resolution. This al- 
lowed us to use the same set of initial condition phases for 
cosmological simulations at different resolutions. Lastly, 
we have implemented the possibility of constraining the 
value of the density or velocity field values, as well as 
their gradients and hessians at a chosen set of positions. 
We will come back in more details on this last point in 
a following section, as it is a non-trivial extension of the 
code. 

3.1.2. Code installation and parameter file structure 

A prerequisite to use mpgrafic is to have (of 
course) a valid MPI library installed, including a for- 
tran90 compiler. A second prerequisite is to have the 
fftw-2.1.5 library installed, with the — enable-mpi 
— enable-type-pref ix options at the configure step. 
The first option builds the static and shared FFTW MPI- 
based libraries, the second is for the type (float or double) 
naming scheme of the libraries. 

Note that the default build of FFTW corresponds to 
double precision, whereas the default type in mpgrafic 
is single precision. To change to single precision real- 
izations, you need to make the single precision FFTW 
MPI libraries by adding the — enable-float at con- 
figure time. To compile mpgrafic in double precision 
mode, you need to configure with the — enable-double 
keyword. 

The code usage has been kept as close as possible to 
the Graf icl code interface, except of course for the few 
additional options to the code. In table [2] we show an 



example of parameter file of mpgraf ic. 

Compared to the original Graf icl parameter files, the 
only differences lie in the possibility of an input power 
spectrum with baryonic oscillations (|Eisenstein and Hul 
[1998), and in the optional input of a small noise file to 
constrain the large scale phases. Otherwise, the code is 
called in the following way: 

mpirun -np < 7^proc> mpgraf ic < parameter_f ile 
Like Graf icl, it produces seven data cubes (one density 
file, three dark matter velocity files, and three baryon 
velocity files). In the file graf icl . inc, the offsets of the 
velocity fields is controlled by the parameters offvelb, 
of fvelc. The redshift of the realizations is controlled by 
the variance of the density field on the grid, as specified 
by sigstart. Finally, the size of the cubes is controlled 
by the parameters npl,np2,np3, that can take different 
values and are set in graf icl . inc. Note that when these 
parameters are changed the code needs recompilation. 

3.1.3. Illustration on a small example, power spectrum 

estimate 

In figure [T] we show a slice of a density file realization 
with the parameter file displayed in Table [21 but with- 
out large scale constraints, and for npl=np2=np3=256. 
In the same figure, we show the density files of linear 
grid size 128 and 64 obtained from this realization, by 
low-pass filtering and subsampling. This operation has 
been done using the utility called degraf , that makes use 
of the FFTW parallel Fourier transform routines. Writ- 
ten in fortran90, it takes as input a collection of files in 
Graf icl format, as well as some parameters in a namelist 
file. These parameters include the list of the input file 
names, the target resolution of the output files, as well 
as an optional shift vector allowing a global translation 
of the output files (with periodic initial conditions). 

Finally, another utility, powergrid, uses the FFTW 
parallel Fourier transform routines to compute the peri- 
odogram estimate of a density field power spectrum. It 
allows correction for nearest grid point (NGP) or cloud 
in cell (CIC, linear) interpolation of a particles set to the 
computation grid. Note that this resampling of a discrete 
density field given by particles onto grid cells leads not 
only to a smoothing of the grid Fourier modes (this can 
be corrected by the code) but also to some power alias- 
ing, that cannot be corrected, unless prior knowledge of 
the density power spectrum is available. These points 
will be illustrated in the next section, for the Horizon 
simulations. Of course, none of these problems appear if 
one is only interested in computing the periodogram of 
the IC grid-sampled density fields. Figure [2] shows the 
theoretical power spectrum corresponding to the density 
field realization shown in Figure [H together with its pe- 
riodogram estimate, as well as the periodograms of its 
low-passed, resampled versions. 

3.2. Constrained initial conditions 

Since mpgraf ic opens the opportunity of generating 
ICs which are consistent with a given low frequency cube, 
it is interesting to build such a cube of phases so that the 
overall cube satisfy (low frequency) point like constraints 
on a given set of points. These constraints fix the mean 
value of the density or velocity field, as well as their gradi- 
ents and hessians, computed at a chosen set of positions, 
for a given smoothing length Rp (see EquationfTTl below) . 
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Fig. 2. — Theoretical power spectrum (solid line, as output in 
the file power.dat by mpgraf ic), together with the periodogram 
estimates (computed by powergrid), of the 256^ density realization 
(top thick line), and its downgraded versions produced by degraf 
(middle and bottom lines for 128^and 64^ respectively). 

Once such a low-frequency cube has been generated, it is 
then whitened, and set as an input to mpgraf ic, which 
is then used to add small scale power and to resample 
the field according to the new Nyquist frequency ^^. 

This procedures allows for instance to generate initial 
conditions which are consistant with, say a given merging 
event, or the structure of the Local Group, the large scale 
structures derived from large surveys such as the SDSS 
(lAdelman-McCarthv and for the SPSS Collaboration! 
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Group 
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Jarrett et all 120031). the 

Mohavaee and Tullvl l2005[ ). etc. 

The ensemble of unconstrained gaussian random fields 
is defined by all possible realizations of the field values 
J(x), or their Fourier amplitudes (^(k), for a given power 
spectrum. If we impose the constraints on the field, the 
statistical ensemble narrows down to a subset of realiza- 
tions, those that have the constraints satisfied. In par- 
ticular, for a discrete representation of the field on a grid 
of size A^^, this means that not all N^ values of 5(x) are 
real degrees of freedom. Averaging over the constrained 
ensemble (. . .)c makes both the mean {5{x.))c and the 
variance dependent on position x. 

We shall be dealing with linear constraints each 
of which, in general, sets a linear functional relation 
14[^(x)] (we will use latin letters to index the constraints 
where each constraint is defined by a grid position x^ and 
a constraint operator Y^, see below) between field values 
to a given value, 14 [^(x)] = Va. 

Alternatively, we can take a point of view that such a 
restricted ensemble defines a new (constrained) random 

^^ In principle, adding small scale power with mpgraf ic, while 
keeping the same low- frequency phases, violates the point-like con- 
straints imposed by constrf ield. However, if the smoothing kernel 
W(kRp) and its associated smoothing length are chosen in such 
a way that it cuts all modes with wave vectors above the initial 
Nyquist frequency, the constraints are not violated by adding small 
scale power. 
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V[6] 
V[6] 
V[6] 
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: (V,V,A-1 
: ViVjS 



density value 
linear displacement 
^Sij)6 fiow shear 

density gradient 
density curvature 



TABLE 1 

Different type of point -like constraints 



field (5c (x) which is still gaussian (due to the linearity 
of imposed relations) but is statistically inhomogeneous. 
Under these conditions, the well known way to construct 
a constrained field from unconstrained realizations is 

=(x) = 6{^)+Y,{S{^)Va[S]){Va[5]Vb[d])-^ (Vb - Vb[5]) . 



ah 



(see e.g. iBardeen et allll986l : iHoffman and RibaklfTool . 
In this expression, the random quantities on the right- 
hand side are unconstrained 5{x). Here V5 is a nu- 
merical value of the constraint 6, while {5{'s)Va[5]) is 
the covariance between the field and a constraint, and 
Cah = {Va[S]Vb[S]) is the convariance matrix between the 
constraint functionals. 




Fig. 3. — Example of contrained realization generated using 
CONSTRFIELD and extended at a 1024^ resolution using mpgraf ic. 
Here a regular grid of 4 x 4 x 4 density peaks is imposed within 
a ACDM cosmological simulation in a box of length 100h~^Mpc. 
Here the constraints are of the density type, so that Y(k) = 1 for 
each constraint; in such a case, Va(S) = 5(xa), where delta(xa) is 
the chosen constraint at position Xa- 



This recipe reproduces the mean (note, the averaging 
is taken over all unconstrained realizations) 

(Jc(x)) = {6{^)Va){VaVb)-^Vb , (9) 

and the correlation function 

(5e(x)<5c(x'))=C(x,x') - (J(x)V;)(Kn)-'(5(x')n) 

+ {4(x))(<5e(x')>, (10) 

which define all statistical properties for the gaussian 
case. 



In cosmology the primary interest is to define con- 
straints that describe the physical properties of a patch 
(see e.g. iBond and Mverslll996l : Ivan de Wevgaertlll996D 
of the density field - density, density derivatives, shear 
fiow, averaged over the volume of the patch. Such con- 
straints can be represented in Fourier space as 



Va[S] = / d''kSik)YaO<.)W{kRp)e 



-ikxa 



(11) 



where x^ is the position of the patch, W{kRp) is a av- 
eraging filter over the patch size Rp, and Ya{l<~) is the 
Fourier representation of the operator that specifies the 
constraint functional. In particular, we use the con- 
straint ty pes given in Table [TJ Using constrained field 
formalism lBond et al.l ([1996) have demonstrated that the 
observed filamentary Cosmic Web of matter distribution 
in the Universe can be understood as dynamical enhance- 
ment of the geometrical properties of intial density field. 
The web is largely defined by the position and primor- 
dial tidal fields of rare events in the medium, such as 
precursors of large galaxies at high redshifts or clusters 
of galaxies at present time, with the strongest filaments 
between nearby clusters whose tidal tensors are nearly 
aligned. 

The code constrf ield implements the same cosmol- 
ogy as mpgraf ic (including baryon wiggles) and offers 
the possibility of whitening ^^ the resulting field in or- 
der to feed it to mpgraf ic as a low frequency input. An 
example of this procedure is illustrated in Fig. [3l 

3.3. Splitting the ICs 

Starting a parallel computation requires the initial con- 
ditions to be dispatched over all the computing processes. 
Two alternatives exist to perform this operation. The 
first one involves having the initial conditions to be read 
by a 'master' process and the data to be broadcasted 
to all the other processes, and then keep or reject the 
broadcasted data according to the topology of the com- 
putation's domain split. While simple to implement, this 
option happens to be difficult to use in practice since 
broadcasting over a large number of processes can be 
technically problematic and time consuming. 

We present an alternative option which involves having 
the processes upload their own set of data only. Because 
it is wasteful for each process to parse the whole set of ini- 
tial conditions to get the relevant subset of data, this op- 
tion implies that initial conditions are pre-split according 
to the domain decomposition strategy of the integrator. 

^^ Here we understand by whitening the operation that trans- 
forms an unconstrained field into a white noise, i.e. a collection 
of independent indentically distributed random variables A/'(0, 1). 
Note that the presence of constraints breaks the independence of 
the grid cells even after "whitening" (see Equation I10|) . 
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This splitting is both a domain boundary assignment, a 
procedure for distributing particles among the processes, 
and a per process file dump. 

It results in a faster procedure, since each process reads 
its own set of data, instead of having one process read- 
ing the full initial conditions. A possible drawback lies in 
the fact that a splitting is defined a priori; changing the 
number of processes dedicated to the computation there- 
fore requires the production of a whole new set of split 
initial conditions. However splitting can be performed on 
a single process, prior to any parallel computation, and 
exhibits a negligible cost in terms of CPU-time. Such 
a procedure can thus be applied an arbitrary number of 
times at almost no expense. 

Horizon simulations were started from split initial con- 
ditions, where each process uploaded its own set of data. 
The splitting scheme followed the domain decomposi- 
tion's strategy of the cosmological calculations performed 
with RAMS ES. It relies on a 31) Peano-Hilbert space 
filling curve (|Salmon and WarrenI Il997l : iMacNeice et al.l 
[2000 ) which provides a complete mapping between the 
3D position of a grid point and a ID coordinate on this 
curve. A two-dimensionnal example of such a curve is 
shown in figure [H By using this piecewise linear rep- 
resentation of the computation domain, each process is 
being given a continuous portion of this curve and load 
balancing is achieved by 'sliding' the limits of the lo- 
cal data sets along the space-filling curve. In particular, 
the initial conditions are by construction well-balanced, 
therefore the splitting among all the processes involves 
a set of even subdivisions of the Peano-Hilbert curve. A 
(i, j, k) grid point maps to a single key q. A set of suc- 
cessive (^1, ^2, •••, ^n) corresponds to a single process p. 
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Fig. 4. — Example of a ICs splitting following a Peano-Hilbert 
domain decomposition in two dimensions. 



Note that all sub-domains are simply connected, i.e. 
within there are no isolated sub-regions owned by a pro- 
cess in the middle of another region owned by another 
process. For instance, in 3D, if the curve is split in 2^ 
sections, each section fits in a 3D rectangle of different 
sizes and orientations (see figure [5j). Moreover, if the 
curve is split in 8^ parts, all the sections fit in a cube of 
the same volume. This type of domain decomposition is 



known to be most efficient if we consider the ensemble of 
all the refined grids configurations. It may be surpassed 
by other strategies (e.g. layer splitting, angular splitting) 
on specific situations, but Peano-Hilbert domain decom- 
position remains the best strategy on average. 




Fig. 5. — Example of a ICs splitting in 16 subvolumes following 
a Peano-Hilbert domain decomposition as seen from four different 
directions. 

Here we implemented a fast and simple algorithm 
which performs the splitting of initial conditions accord- 
ing to the Peano-Hilbert domain decomposition. It relies 
on a plane- by-plane investigation of the data cube which 
limits the memory consumption. Let n^ be the number 
of cells in the initial conditions data cube, F{i^j^k) the 
value of the 3D field at grid indices 1 < i,j,k < n and / a 
2D slice of F at index k. We assume that the number of 
processes nproc is a power of two (even though this con- 
straint can be lifted as explained below); it ensures that 
each process domain is a 3D rectangle. Consequently 
the extent of the sub-domain corresponding to a process 
p is given by two triplets {imjm, kmY and {im.Jm, kuY 
which correspond to the two extreme corners of the rect- 
angle. The algorithm described below runs on one pro- 
cess only, and involves two distinct steps: 

• do /c = 1, n : loop over data planes 

— read / = F(l : n, 1 : n, /c) : the 2D data is 
uploaded. 

— initialisations: 

* process(l : nproc) =. false., 

* {im,jm,kmY = "1, (iM.JM.kMY = "1, 

Vp G [1 : nproc] 

— doj = l,n,z = l,n 

* Peano-Curve mapping : (i,j,k) ^ q ^ p 

* process p is found in the k plane: pro- 
cess(p)=.true. 

* if process p found for the 1st time set the 
minimum position : if {im-,jm-, kmY — ~^ 
then {imJm.kmY = ihjik) 



* update the maximum position: 
if {ij,k) > {iM^JM^kuY then 
{iM^JM^kuY = {ij,k) 

— do p = l,nproc 

* if process(p)=. false, skip : only processes 
detected in the plane are taken in ac- 
count. 

* write f(i^ : i^, j^ : j^) in the file of the 
process p. 

First, initial conditions fields are parsed plane by plane 
and for each plane, the process map is achieved through 
the space-filling curve mapping. Then for each process 
found in the current plane the subset of data is writ- 
ten in the relevant files. Because of the compacity of 
the sub-domains, a significant speedup can be obtained 
by parsing the i and j indexes of the second loop with 
steps larger than 1 : this procedure is safe as long as 
the step remains smaller than the smallest extent of the 
sub-domains along one direction. We call this step the 
speedup step. Finally the nproc= 2^ constraint can be 
lifted if a set of processes upload the same set of data. 
Each process would load a small initial condition file 
and would retain only its 'sub-sub-domain' within a sub- 
domain. For instance the Horizon 411 simulation ran on 
6144 = 3 X 2048 processes: the splitting was performed 
over 2048 sub-domains and each sub-file was uploaded 
by three processes. Overall, this algorithm can be quite 
effective and as an illustration, the 4096^ initial condi- 
tions fields of this simulation were split in 15 minutes on 
a single process of the CCRT computing center using a 
speedup step of 256. The code ran on Itanium2 proces- 
sors (double-core, but only a single core has been used 
here) with a 1.6 GHz frequency. 

4. APPLICATION TO LARGE HORIZON 
SIMULATIONS 

Let us now illustrate on a couple of large scale simula- 
tion how mpgraf ic was used. We will consider in turn 
a hydrodynamical and a dark matter only simulation. 

4.1. Horizon- MareNostrum 

The first major application of the mpgraf ic code was 
the generation of ICs for a simulation of a cosmological 
hydrodynamical simulation of linear size 50/i~^Mpc on a 
grid of size 1024^. 

For the Mare Nostrum simulation, we started for tech- 
nical reasons with external ICs (given as a set of particle 
velocities), as one of the goals was to compare two n- 
body plus hydro codes (namely the RAMSES and GAD- 
GET2 codes) on similar ICS. The particle velocities were 
therefore read from external ICs and we performed the 
derivation of the density field samples on the grid from 
the particle velocities in Fourier space, using the JMFFT 
Fourier Transform package^^. The / factor involved in 
Eq. ([7j) was computed using routines provided in the 
original Graf icl package. Once the velocity divergence 
and /are known, the initial density field is easily recov- 
ered and ready to be used as an input for simulations 
(especially for the hydrodynamical part of RAMSES), or 
as a source for a specific set of initial phases. 

^^ http://www.idris.fr/data/publications/JMFFT/ 
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Fig. 6. — The power spectrum P(k) of the 'Mare Nostrum' ini- 
tial conditions. Symbols stand for the measured power spectrum, 
Pmes(k), while the dotted line stands for the theoretical power spec- 
trum Pin(^)- The dashed line, PHan(^) , stand for the theoretical 
power spectra plus a Hanning filter contribution. The solid line, 
pfinal (^) stand for our best fit of the power spectrum. The bottom 
panel represents a zoomed version of the top one where only the 
small scales are shown. 



4.1.1. Power Spectrum Extraction from Mare Nostrum. 
Initial Conditions 

The master equation (Eq. ([5])) can only be inverted 
knowing the convolution kernel, i.e. the power spec- 
trum P{k). In principle, the knowledge of the cosmology 
and the included physics should be sufficient to derive 
P{k) prior to the deconvolution. Let us call Pin{k), this 
theoretical power spectrum constrained only by physics. 
In practice, this theoretical power spectrum differs from 
the effective power spectrum used to draw the (exter- 
nal, particle based) ICs. Our goal in this section is to 
define a power spectrum Pfinal (^) that should accurately 
represent the ensemble averaged power spectrum of the 
external ICs (so that (Pmes(^)) ^ Pfinai(^))), based on 
the available theoretical and measured power spectra. 
Pfinal (^) will then be used in the deconvolution. 

Using an inaccurate spectrum to deconvolve Eq. ([5]) 
would lead to a 'colored' noise for the initial phases, i.e. 





Fig. 7. — Comparison between the gas in the SPH MareNostrum simulation at redshift z = 5.7 {left) from which ICs the Horizon white 
noise was extracted, and the Horizon-MARENOSTRUM simulation [right) for which the initial conditions were generated with mpgraf ic. 
This figure demonstrates that on large scales the phases are indeed reproduced by the procedure. It also shows that on these scales, the 
two code produce quite similar features. 



with spurious characteristic length scales. 

Let us illustrate the discrepancy between the theoret- 
ical and the measured P{k) by describing the power 
spectrum of the Mare Nostrum ICs. The measured 
P(k) is shown in figure [6] as triangles. The theoretical 
P{k) = Pin{k) (i.e. the one used to generate this set of 
ICs) is also shown as a dotted line and unsurprisingly, 
the two curves disagree. 

At low /c, the finite volume of the simulation implies 
that the empirical power spectrum of large scale modes 
has large sampling variance and thus departs from the 
theoretical curve. The high k discrepancy is of different 
nature: clearly the sampling variance is negligible, but 
now the discreetness of the grid and anisotropy of very 
high k mode representation play role. P{k) departs sig- 
nificantly from Pin{k) as easily seen when zooming on 
the large k regions, where P{k) lacks power compared to 
the expected behavior. Therefore, Pin{k) cannot be used 
without corrections to whiten the external IC set. In the 
following, we rely on the fact that Gaussian initial con- 
ditions are statistically characterized by power spectrum 
only. 

The exact set of corrections depends on how the field 
have been generated. For Mare Nostrum ICs we must 
first include a Hanning filter defined in the Fourier space 

by 

Wnik) = cos(^), (12) 

Here the Nyquist frequency is given by kjsf = 2i\ jL x A/'/2, 
where L = 50 Mpc/h is the size of the box of the Mare 
Nostrum ICs and N stands for the (linear) number of 
grid elements. Such a filtering is frequently encountered 
when dealing with initial conditions: because Fourier 
modes are sampled on a cartesian grid, the two condi- 
tions k < 7rN\^/L and (kx^ky^k^) < ttN/L imply that 
anisotropics arise on the smallest scales along the diag- 
onals. The Hanning filter damps high frequency modes, 
and reduces the small scales contributions and conse- 
quently the anisotropics. In Fig. [6l we display the Puan 



curve as a dashed line, where: 

PHan=WH{kfPin{k). 



(13) 



Clearly, PHan(^) reproduces well the measured power 
spectrum (except at high frequencies, see below), with 
N = 2048, which corresponds to the original resolution 
of the external ICs, prior to some (external) degrada- 
tion procedure. This modification of the spectrum cor- 
responds to the most favorable case where an analytic 
expression is known or can be found for the filtering ap- 
plied on the data. 

Secondly, Fig. [6] shows that PHan(^) still lack some 
power for /c > 40 h/Mpc. This feature corresponds to 
the external degradation procedure: one particle out of 
eight was provided, out of the original 2048^ particles, 
resulting in power aliasing at high frequency. This part of 
the power spectrum was fitted by a smoothed version of 
the measured power spectrum. We call Pfinai(^) this final 
power spectrum that includes the effect of the Hanning 
Filter and corrects the high frequencies effects due to the 
degradation: 

Pfinal(^)= Pu.n{k) k < 40h/MpC, (14) 

5[P^e.(^)]k>40h/Mpc, (15) 

where S[X] stands for a smoothing operator. By using a 
smoothed version of the spectrum, we avoid overfitting of 
the fiuctuations in the spectrum, which would artificially 
reduce the variance at these scales. 

To conclude, Pfinai(^) combines both an analytic ex- 
pression of the filtered theoretical spectrum at low fre- 
quencies and a numerical evaluation at high frequencies, 
based on the measured power spectrum. We emphasize 
that these choices are by no means unique but were found 
to provide phases with the proper spectrum. 

4.1.2. Whitening 

The "whitening" operation (i.e. getting the white noise 
ni(x) from 5{x.) and Pfinai(^)) is then performed by de- 
convolution as in Eq. ([5]) . For the Horizon Mare Nostrum 



ICs, the power spectrum of the resulting white noise is 
shown in Fig. [SI The whitened ICs are now ready to be 
processed into new initial conditions: the whitened ICs 
serve as low frequency constraints when generating the 
refined ICs using mgraf ic 




k [h/Mpc] 

Fig. 8. — The power spectrum of the phases contained in the 
'Mare Nostrum' initial conditions, with the expectation value of the 
power of a white noise of unit variance substracted . As expected 
from a white noise's realisation, the spectrum fluctuates around an 
overall flat line. 

The code mpgrafic was then used to generate the 
ICs of a ACDM hydrodynamical "full physics" simula- 
tion (with star formation and metals) with a boxsize of 
50h~^Mpc on a grid of 1024^ using the Horizon ref- 
erence white noise. The comparison between the input 
phases and the reproduced phases with mpgrafic is il- 
lustrated in Figure [3 which shows both simulations at 
redshift 5.7. 

4.2. Horizon 4n 

As mentionned earlier, the code mpgrafic was also 
used to generate the ICs of H0RlZ0N-4n a ACDM dark 
matter only simulation based on cosmological parame- 
ters inferred by the WMAP three-years results, with a 
boxsize of 2h~^Gpc on a grid of size 4096^. The purpose 
of this simulation is to investigate full sky weak lens- 
ing and baryonic accoustic oscillations. The 70 billions 
particles were evolved using the Particle Mesh scheme of 
the RAMSES code on an adaptively refined grid (AMR) 
with about 140 billions cells. Each of the 70 billions cells 
of the base grid was recursively refined up to 6 addi- 
tional levels of refinement, reaching a formal resolution 
of 262144 cells in each direction (roughly 7 kpc/h comov- 
ing). 

The corresponding power spectrum was measured us- 
ing powergrid, and is shown in Figure [Til Since the 
simulation snapshots involves a collection of particles, we 
had to resample them on a grid using a convolution ker- 
nel before estimating the power spectrum. The resulting 
(analytical) bias in the power spectrum was corrected; 
however, this resampling procedure leads to some power 
aliasing close to the Nyquist frequency that cannot be 
corrected without additional information. Finally, note 




Fig. 9. — A multi resolution view of HORIZON 411. The outer re- 
gion corresponds to a view of the universe on scales of 16/i~-'^Gpc: 
it is generated by unfolding the simulation while cuting a slice 
obliquely through the cube in order to preserve the continuity of 
the field thanks to the periodicity. The intermediate region cor- 
responds to a slice of 2h~^Gpc, while the inner region is at the 
original resolution the initial conditions. RAMSES has refined 6 
times over the course of the run from that resolution. 



C 1.00 




Fig. 10. — The measured baryon wiggles at z = together with 
the corresponding fit (scaling like exp( — [/c/O.l]-*^-^) sin(27rA://cA) 
plus some linear drift in k), which finds that 27r//cA = 113/i~-'^Mpc. 



that the AMR structure of the RAMSES code leaves 
the opportunity of measuring the power spectrum at 
frequencies beyond the Nyquist frequency of the 4096^ 
grid. Such measurements are outside the scope of this 
paper, and the power spectrum tool powergrid should 
be viewed primarily as a diagnosis tool. Here, as a check 
of both the initial condition generation algorithm, includ- 
ing the implementation of the baryon wiggles, a novelty 
of this implementation, figure [10] displays the measured 
baryon wiggles in the z = snapshot, together with the 
corresponding fit. 
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Parameters 


Description 


4 


Power spectrum parametrization: 4 is Eisenstein &; Hu 


0.24,0.76,73.0 


Qm^A^Ho 


0.042 


n, 


0.96 


ns 


-0.92 


Normalization: —ag if negative, Qrms if positive 


0.01,100.0 


kmin and kmax in h~^Mpc for analytical PS output 


-50.0 


Box length in h~^Mpc if negative, mesh length in Mpc if positive 


1 


Graf icl mode: no choice here 





Graf icl mode: no choice here 


1 


1: Generate noise file and save it / 2: Read from noise file 


1234 


Initial seed (useful if 1 is set above) 


white-256.dat 


Noise file name 


1 


1: Constraint of large scale phases with small noise file / 2: No padding 


white-128.dat 


Small (constraint) noise file name 



TABLE 2 

MPGRAFIC PARAMETER FILE EXAMPLE. 



5. CONCLUSION 

A series of tools to construct and validate initial con- 
ditions for large (n > 1024^) cosmological simulations in 
parallel were presented and illustrated. These tools in- 
volve ICs generation with optional constraints, low-pass 
filtering and resampling, power spectrum estimation, es- 
timation of matter density on a grid from a set of parti- 
cle positions and Peano-Hilbert domain decomposition. 
As illustrated in section [H they allowed us to produce 
very large cosmological simulations. From these high- 
resolution ICs, one can then create at will, zoom-like ini- 
tial conditions and constrained ICs with the help of the 
resampling tool. Let us emphasize that mpraf ic pro- 
vides an alternative route to initial condition generation 
such as Graf ic2. It is more versatile as it does not im- 
pose any relationship between resolution and boundaries 
for the refined sub volumes. The remaining limitation 
is the total amount of memory available on distributed 
architectures. A logical extension of this work will be 
to generate initial conditions corresponding to the local 



group. Note finally that with simple amendments, the 
above mentionned code could be used in the context of 
vector field generation (magnetic field with a given he- 
licity), or turbulence. 
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Fig. 11. — A Measure of the power spectrum with powergrid of 
the H0RlZ0N-4n in the mpgrafic generated initial condition (bot- 
tom curve) and at redshift zero for different samphngs (top curves 
for resp. 512^, 1024*^ and 4096^ as labeled), with the modes k in 
units of h~^Mpc. Here the density is resampled on the grids using 
the nearest grid point kernel (NGP) whose bias is corrected. How- 
ever, the shot noise bias was not corrected in this figure, and no 
attempt was made to correct the power aliased by the resampling 
procedure. Both measurements are compared to resp. the linear 
theory and the theoretical predictions of Smith et al. (2003). Note 
that the agreement between the linear theory and the generated 
initial conditions is excellent for all measured scales. 



